Part 3 out of 3
3. Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data? Please limit your answer to 8 images and 500 words.
In order to tag the owners of each credit card and loyalty card to the car id, we would need to combine several factors together to triangulate the results. The two conditions that will be used to triangulate the data between the three datasets are:
The locations coordinates would be assigned by referencing the tourist map of Abila. However, from the earlier section, we discovered that the tourist map provided might not be accurate in locating the location coordinates as the icons on the tourist map might not represent the exact coordinates of the location.
Furthermore, the tourist map do not have all the locations marked by its logo which will not allows a full join with the locations in the cc transaction data. Table 5 shows the locations from the cc dataset whose logo could not be located visually on the tourist map of Abila. Ranking the number of transaction at each location in descending order, there are high volume of transactions at those locations and the need to map their GPS coordinate is necessary.
## Transactions on 13/01/2014 at "Frydos Autosupply n' More"
locations <- data.frame(location = cc$location) %>%
group_by(location) %>% summarize(number_transactions=n())
knitr::kable(locations %>%
dplyr::filter(location == "Abila Zacharo" |
location == "Brewed Awakenings" |
location == "Daily Dealz" |
location == "Hippokampos" |
location == "Kalami Kafenion" |
location == "Kronos Pipe and Irrigation" |
location == "Octavio's Office Supplies" |
location == "Shoppers' Delight" |
location == "Stewart and Sons Fabrication") %>%
arrange(desc(number_transactions)), "simple",
caption="Table of location with no traceable coordinates")
| location | number_transactions |
|---|---|
| Hippokampos | 171 |
| Abila Zacharo | 72 |
| Kalami Kafenion | 64 |
| Brewed Awakenings | 30 |
| Shoppers’ Delight | 20 |
| Stewart and Sons Fabrication | 18 |
| Kronos Pipe and Irrigation | 6 |
| Octavio’s Office Supplies | 4 |
| Daily Dealz | 1 |
Figure 1 shows the map marked with blue dots representing the stationary GPS coordinate of all the cars except for each employee house. The popular locations can be determined by the frequency of the blue dots at a particular location on the map.
Cross referencing with the transactions table, the locations coordinates were tag with their corresponding coordinates by cross-referencing to the car GPS data and geo-referenced data.
## Getting coordinates of car stop positions
first_gps <- gps_stop %>%
group_by(id) %>%
filter(row_number()==1) %>%
ungroup(id)
gps_pts <- gps_stop %>% ungroup(id) %>%
add_row(first_gps) %>% group_by(id) %>% arrange(timestamp) %>%
filter(!(start_vec==1 & stop_vec==1)) %>%
group_by(id) %>% arrange(timestamp) %>%
mutate( start.time = ifelse(start_vec== 0 & stop_vec==0, timestamp, NA),
start.time = ifelse(start_vec==1, timestamp,NA),
end.time=ifelse(stop_vec==1, timestamp, NA),
start.gps = ifelse(start_vec==0 & stop_vec==0, geometry,NA),
start.gps = ifelse(start_vec==1, geometry,NA),
end.gps=ifelse(stop_vec==1, geometry,NA),
end.time = ifelse(start_vec==1, lead(end.time), end.time),
end.gps = ifelse(start_vec==1, lead(end.gps), end.gps)) %>%
filter(!is.na(start.time))%>%
mutate(end.gps = ifelse(end.gps=='NULL',start.gps,end.gps),
end.time = ifelse(is.na(end.time),start.time, end.time),
start.time= as_datetime(start.time),
end.time=as_datetime(end.time),
next.start.time=lead(start.time),
driving.time=round(difftime(end.time,start.time,units='mins'),2)) %>%
dplyr::select(id, date, start.time, end.time, start.gps, end.gps,
next.start.time, driving.time) %>%
mutate(start.gps=purrr::map(start.gps, st_point) %>% st_as_sfc(crs=4326))%>%
mutate(end.gps=purrr::map(end.gps, st_point) %>% st_as_sfc(crs=4326))
car$CarID <- as_factor(car$CarID)
gps_pts <- left_join(gps_pts, car, by=c("id"="CarID"))
gps_stop_points1 <- gps_pts %>%
mutate(time.stop = difftime(next.start.time, end.time, units=c("mins")),
time.stop = as.numeric(time.stop))%>%
filter(time.stop < 300) %>%
dplyr::select(id, start.time, start.gps)
## Generate map with the stop positions in blue dots
tmap_mode("view")
map_POI<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_stop_points1)+
tm_dots(col="blue", shape=30,id="id",
popup.vars=c("Car ID"="id",
"Stationary timestamp" = "start.time",
"GPS:"="start.gps"))
tmap_leaflet(map_POI)
Figure 1: GPS stationary locations
The car id are triangulated by tabulating the centroid coordinates of the GPS data from the stationary GPS stop locations from the map. However, there are few limitations by using the methodology mentioned earlier for tagging the owners.
The interactive heatmap in Figure 2 shows the percentage that were successfully match with the car GPS and cc transaction data by the conditions mentioned earlier. The histogram was also plotted to visualise the distribution of the result. From the two visualisation, we observed that the methodology yield some high percentage match for the car id owner with the cc owner.
# Tagging location coordinates
location_tag <- data.frame(location = c(locations$location,"GAStech"),
long =c(centroid(rbind(c(24.82590612, 36.05102229),c(24.82591819, 36.05092013),c(24.82598413, 36.05097547)))[1],
centroid(rbind(c(24.84592966, 36.07443715),c(24.84598782, 36.07434876),c(24.84595026, 36.07437836)))[1],
centroid(rbind(c(24.85097804, 36.06349268),c(24.85099445, 36.06342076),c(24.85103178, 36.06348173)))[1],
centroid(rbind(c(24.87617634, 36.07713037),c(24.87621582, 36.07713598),c(24.87619872, 36.07715385)))[1],
centroid(rbind(c(24.85626503, 36.07529323),c(24.85631411, 36.07523202),c(24.85634841, 36.07528136)))[1],
centroid(rbind(c(24.85089145, 36.08172086),c(24.85096025, 36.08176242),c(24.85087799, 36.08180554)))[1],
centroid(rbind(c(24.90119998, 36.05402165),c(24.90128202, 36.05408823),c(24.90116585, 36.05411015)))[1],
NA,
centroid(rbind(c(24.88089399, 36.05851786),c(24.88092086, 36.05858619),c(24.8808655, 36.05856303)))[1],
centroid(rbind(c(24.8951996, 36.07073983),c(24.89517891, 36.07062423),c(24.89526281, 36.07069274)))[1],
centroid(rbind(c(24.88983886, 36.05469486),c(24.88978433, 36.05463184),c(24.88977321, 36.05467589)))[1],
centroid(rbind(c(24.86416839, 36.07332041),c(24.86417651, 36.07336116),c(24.86419582, 36.07332868)))[1],
NA,
centroid(rbind(c(24.86068835, 36.08962196),c(24.86068191, 36.08954231),c(24.8607611, 36.08960361)))[1],
centroid(rbind(c(24.84132949, 36.07213193),c(24.84134818, 36.07212045),c(24.4134819, 36.07212044)))[1],
centroid(rbind(c(24.905573, 36.06044638),c(24.90561679, 36.06033304),c(24.90568587, 36.06040053)))[1],
centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[1],
centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[1],
centroid(rbind(c(24.90096913, 36.05842562),c(24.90107066, 36.05844726),c(24.90097455, 36.05850897)))[1],
centroid(rbind(c(24.88586605, 36.063639),c(24.88595361, 36.06364584),c(24.88586737, 36.06371539)))[1],
centroid(rbind(c(24.85756422, 36.07660977),c(24.85763811, 36.07664766),c(24.857573, 36.07669909)))[1],
centroid(rbind(c(24.87330651, 36.06751231),c(24.87335583, 36.06750587),c(24.87333867, 36.06755141)))[1],
centroid(rbind(c(24.85237319, 36.06582037),c(24.85241027, 36.06582475),c(24.85237372, 36.06584816)))[1],
centroid(rbind(c(24.89986767, 36.05442391),c(24.89996154, 36.05448329),c(24.89987365, 36.05453273)))[1],
centroid(rbind(c(24.84983351, 36.06587998),c(24.84983936, 36.06582196),c(24.8497762, 36.06583535)))[1],
NA,
centroid(rbind(c(24.88551872, 36.05840982),c(24.88542068, 36.0584603), c(24.88553455, 36.05844325)))[1],
centroid(rbind(c(24.83307421, 36.0653098),c(24.83314028, 36.06523446), c(24.84143955, 36.06403449),c(24.84141463, 36.06410072)))[1],
NA,
centroid(rbind(c(24.87077341, 36.05196196),c(24.87081903, 36.05192066),c(24.87083665, 36.05197804)))[1],
centroid(rbind(c(24.85227441, 36.06324941),c(24.85226894, 36.06330479),c(24.8523291, 36.0632684)))[1],
NA,NA,
centroid(rbind(c(24.87148791, 36.06774029),c(24.8714995, 36.06774623),c(24.87149104, 36.06776587)))[1],
centroid(rbind(c(24.87956897, 36.04802112),c(24.8795714, 36.04804908), c(24.8795745, 36.0480309)))[1]),
lat = c(centroid(rbind(c(24.82590612, 36.05102229),c(24.82591819, 36.05092013),c(24.82598413, 36.05097547)))[2],
centroid(rbind(c(24.84592966, 36.07443715),c(24.84598782, 36.07434876),c(24.84595026, 36.07437836)))[2],
centroid(rbind(c(24.85097804, 36.06349268),c(24.85099445, 36.06342076),c(24.85103178, 36.06348173)))[2],
centroid(rbind(c(24.87617634, 36.07713037),c(24.87621582, 36.07713598),c(24.87619872, 36.07715385)))[2],
centroid(rbind(c(24.85626503, 36.07529323),c(24.85631411, 36.07523202),c(24.85634841, 36.07528136)))[2],
centroid(rbind(c(24.85089145, 36.08172086),c(24.85096025, 36.08176242),c(24.85087799, 36.08180554)))[2],
centroid(rbind(c(24.90119998, 36.05402165),c(24.90128202, 36.05408823),c(24.90116585, 36.05411015)))[2],
NA,
centroid(rbind(c(24.88089399, 36.05851786),c(24.88092086, 36.05858619),c(24.8808655, 36.05856303)))[2],
centroid(rbind(c(24.8951996, 36.07073983),c(24.89517891, 36.07062423),c(24.89526281, 36.07069274)))[2],
centroid(rbind(c(24.88983886, 36.05469486),c(24.88978433, 36.05463184),c(24.88977321, 36.05467589)))[2],
centroid(rbind(c(24.86416839, 36.07332041),c(24.86417651, 36.07336116),c(24.86419582, 36.07332868)))[2],
NA,
centroid(rbind(c(24.86068835, 36.08962196),c(24.86068191, 36.08954231),c(24.8607611, 36.08960361)))[2],
centroid(rbind(c(24.84132949, 36.07213193),c(24.84134818, 36.07212045),c(24.4134819, 36.07212044)))[2],
centroid(rbind(c(24.905573, 36.06044638),c(24.90561679, 36.06033304),c(24.90568587, 36.06040053)))[2],
centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[2],
centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[2],
centroid(rbind(c(24.90096913, 36.05842562),c(24.90107066, 36.05844726),c(24.90097455, 36.05850897)))[2],
centroid(rbind(c(24.88586605, 36.063639),c(24.88595361, 36.06364584),c(24.88586737, 36.06371539)))[2],
centroid(rbind(c(24.85756422, 36.07660977),c(24.85763811, 36.07664766),c(24.857573, 36.07669909)))[2],
centroid(rbind(c(24.87330651, 36.06751231),c(24.87335583, 36.06750587),c(24.87333867, 36.06755141)))[2],
centroid(rbind(c(24.85237319, 36.06582037), c(24.85241027, 36.06582475),c(24.85237372, 36.06584816)))[2],
centroid(rbind(c(24.89986767, 36.05442391),c(24.89996154, 36.05448329), c(24.89987365, 36.05453273)))[2],
centroid(rbind(c(24.84983351, 36.06587998),c(24.84983936, 36.06582196),c(24.8497762, 36.06583535)))[2],
NA,
centroid(rbind(c(24.83307421, 36.0653098),c(24.83314028, 36.06523446), c(24.84143955, 36.06403449),c(24.84141463, 36.06410072)))[1],
centroid(rbind(c(24.88551872, 36.05840982),c(24.88542068, 36.0584603), c(24.88553455, 36.05844325)))[2],
NA,
centroid(rbind(c(24.87077341, 36.05196196),c(24.87081903, 36.05192066),c(24.87083665, 36.05197804)))[2],
centroid(rbind(c(24.85227441, 36.06324941),c(24.85226894, 36.06330479),c(24.8523291, 36.0632684)))[2],
NA,NA,
centroid(rbind(c(24.87148791, 36.06774029),c(24.8714995, 36.06774623),c(24.87149104, 36.06776587)))[2],
centroid(rbind(c(24.87956897, 36.04802112),c(24.8795714, 36.04804908), c(24.8795745, 36.0480309)))[2]))
location_tag <- location_tag %>% na.omit()
location_tag <- st_as_sf(location_tag, coords=c("long","lat"), crs=4326)
## join GPS data with transaction data with location coordinates
final_trans_gps <- inner_join(final_trans_1, location_tag, by=c("location")) %>%
rename(loc.coord=geometry)
## Join with car GPS and tag the location to car gps
gps_match <- final_trans_gps %>%
left_join(gps_pts , by=c("date"))%>%
group_by(last4ccnum) %>% arrange(datetime) %>%
filter(datetime > end.time & datetime <= next.start.time + minutes(30)) %>%
mutate(diff.dist = st_distance(loc.coord, end.gps, by_element=TRUE),
diff.dist = as.numeric(diff.dist)) %>%
filter(diff.dist <500)
tagging <-gps_match %>%group_by(last4ccnum, id)%>%
summarize(tag=n()) %>% arrange(desc(tag))
## Get total count of transactions minus the 4 locations per cc num
trans_collapse <- cc %>% mutate(last4ccnum=as_factor(last4ccnum)) %>%
filter(!(location %in% c("Bean There Done That",
"Brewed Awakenings",
"Coffee Shack",
"Jack's Magical Beans"))) %>%
group_by(last4ccnum) %>% summarize(total=n())
## Limit to top 3 match only by percentage
tagging_cc_gps <- left_join(tagging, trans_collapse, by=c("last4ccnum")) %>%
mutate(percent=round(tag/total*100,2))
tag_plot<-ggplot(tagging_cc_gps, aes(x=id, y=last4ccnum,fill=percent))+
geom_tile() + scale_fill_gradient(low="sienna1", high="navyblue") +
xlab("Car ID") +ylab("CC last 4 number")+
labs(fill="% match")
histogram<-ggplot(tagging_cc_gps,aes(percent))+geom_histogram(binwidth=5)+
stat_function(fun=dnorm,aes(color="red"),
args=list(mean=mean(tagging_cc_gps$percent),
sd=sd(tagging_cc_gps$percent)))
ggplotly(tag_plot) %>% layout(hoverlabel=list(bgcolor="white"))
Figure 2: Car GPS tagging to CC number
Figure 2: Car GPS tagging to CC number
Hence, we can confidently infer that matches over 75% will be accurate. However, as there are more cc owners (55 unique owners) than car owners (35 unique car id) and the truck drivers share vehicles (5 unique truck id), we will drop the truck drivers with car id of 100 and above. Observation of the heatmap in figure 2 reveals that car id 23, car id 29 and car id 30 has matches of more than one cc number and car id 28 does not have a match with more than 75%.
From Table 6, we observe that car id 23 matches to three unique cc number with matches over 75%. The highest percentage match to cc 3484 at 91.43% shows high probability for inference, hence the observation that matches to cc 8202 and 8411 will be dropped.
For car id 29 and 30, the matches to cc number percentage are relatively high and defers less than 10%. Further investigation on the GPS map location will be performed to verify which match to retain.
## Get the match of car id to cc last4ccnum
tagging <- tagging_cc_gps %>% mutate(id=as.character(id), id=as.numeric(id)) %>%
filter(percent>=75 & id<100)
knitr::kable(tagging %>% filter(id==23 | id==29 | id==30) %>%
arrange(id), "simple",
caption="Table of employees record and their cc and loyalty number")
| last4ccnum | id | tag | total | percent |
|---|---|---|---|---|
| 3484 | 23 | 31 | 35 | 88.57 |
| 8202 | 23 | 25 | 33 | 75.76 |
| 8411 | 23 | 25 | 32 | 78.12 |
| 3547 | 29 | 18 | 20 | 90.00 |
| 5921 | 29 | 13 | 14 | 92.86 |
| 6901 | 30 | 31 | 37 | 83.78 |
| 8202 | 30 | 25 | 33 | 75.76 |
final_tagging <- tagging %>%
filter(!(last4ccnum==8202 & id==23), !(last4ccnum==8411 & id ==23))
Investigation of car id 28 low cc transactions matches was visualised in Figure 3 and it revealed that the GPS coordinates of car id 28 has lots of noise. The noise in the GPS line caused a wider spread of GPS line in the visualisation on the map and also zig-zag incoherent GPS path. This most probably signifies a faulty GPS signal on the car.
Secondly, we observe that the stop position was not accurate. For example, the frequency of GPS stop coordinates at the extreme south of the map should be at GAStech. Hence, the GPS stop coordinates seems to deviate in the North-West direction. The most probable explanation will be a faulty GPS system since the GPS points were noisy and were not correctly geo-referenced on the map.
## Map geometry for original car id 28 data
gps_path5 <- gps_sf %>%
filter(id==28) %>%
group_by(id) %>%
summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_28_points <- gps_stop %>% filter(id ==28)
## Plot interactive map
tmap_mode("view")
map5<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_path5)+
tm_lines() +
tm_shape(gps_28_points)+
tm_dots(col="blue")
tmap_leaflet(map5)
Figure 3: Original GPS for car id 28
After re-calibrating the GPS coordinates for car id 28, Figure 4 shows the GPS movement data for car id 28. With the re-calibrated GPS data, we would match it with the cc transaction data to infer which cc belongs to car id 28.
From the map in Figure 4, the unqiue observation was that car id 28 visited Ahaggo Museum on the 18th and 19th of Jan and frequently patronise Jack’s Magical Beans and Ouzeri Elian over the two weeks.
From the cc transaction table, a search of Ahaggo Museum revealed that cc 1286, 7384 and 9241 made transactions on the 18th and 19th of Jan. Next, a search of Jack’s Magical Beans shows that only cc 9241 out of the three cc made transactions at the location. Lastly, a search of Ouzeri Elian on the datatable reveals that cc 9241 made 6 transactions at the location. Hence, we are confident to infer that car id 28 is the owner of cc 9241.
## Map geometry for re-calibrated Car id 28
gps28 <- gps %>% filter(id==28) %>%
mutate(long = long +0.005,
lat=lat-0.002)
gps_sf28 <- st_as_sf(gps28, coords=c("long","lat"), crs=4326)
gps_path28 <- gps_sf28 %>% group_by(id) %>%
summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps28_pt <- gps_sf28 %>%
group_by(id) %>% arrange(timestamp) %>%
mutate(start_diff= as.numeric(timestamp - lag(timestamp,default=first(timestamp)))/60,
stop_diff= as.numeric(lead(timestamp)-timestamp)/60,
date = as.Date(timestamp)) %>%
rename(gps.coord=geometry) %>%
filter(start_diff>5 | stop_diff >5) %>%
mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))
## Plot interactive map
tmap_mode("view")
map6<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_path28)+
tm_lines() +
tm_shape(gps28_pt)+
tm_dots(col="blue")
tmap_leaflet(map6)
Figure 4: Re-calibrated GPS for car id 28
final_tagging <- final_tagging %>%
dplyr::select(last4ccnum, id) %>%
mutate(last4ccnum = as.character(last4ccnum),
id = as.character(id)) %>%
bind_rows(c(last4ccnum="9241", id="28"))
Next, we will focus on car id 29 where it matches 90% of cc 3547 transactions and 100% of cc 5921. The high proportion of matches on both credit card warrants some investigation into the data.
Looking at table 7 for both cc number, we observe that cc 3547 has transactions between 12/01/2014 to 19/01/2014 and cc 5921 has transactions between 06/01/2014 to 10/01/2014. Cross-referencing the GPS data for car id 29 in Figure 5, we can observe that the cc transactions matches the GPS data of car id 29. A possible deduction is that the owner of car id 29 used both cc card as there was no overlap in the transaction dates for both cc. Possible scenario could be that the owner switch the CC from 5921 to 3547 after 10/01/2014. However, there might be missing data on 11/01/2014 where it was not captured on both cc. Hence, we will tag car id 29 to both cc 5921 and 3547.
gps_path29 <- gps_sf %>%
filter(id==29) %>%
group_by(id) %>%
summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps29_pt <- gps_stop_points1 %>% filter(id==29)
tmap_mode("view")
map7<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_path29)+
tm_lines() +
tm_shape(gps29_pt)+
tm_dots(col="blue")
cc3547 <- cc %>% filter(last4ccnum==3547) %>% dplyr::select(-datetime, -date)
cc5921 <- cc %>% filter(last4ccnum==5921) %>% dplyr::select(-datetime, -date)
knitr::kable(list(cc3547,cc5921),caption="Transactions for CC 3547 & 5921")
|
|
tmap_leaflet(map7)
Figure 5: GPS for car id 29
Lastly, we will look at car id 30 with cc 6901 and 8202. The GPS data for car id 30 was visualise in Figure 6 and the transaction from cc 6901 and 8202 in table 8.
Comparing the GPS data map and cc translation data, we focused on locations with a lower frequency of visit and locations in a less congested area for easier verification. From the 3 locations and transaction details below, we can deduce that cc 6901 matches car id 30.
gps_path_30 <- gps_sf %>%
filter(id==30) %>%
group_by(id) %>%
summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_stop_points30 <- gps_pts %>%
mutate(time.stop = difftime(next.start.time, end.time, units=c("mins")),
time.stop = as.numeric(time.stop))%>%
filter(time.stop < 300 & id==30) %>%
dplyr::select(id, start.time, start.gps)
## Plot interactive map
tmap_mode("view")
map8<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_path_30) +
tm_lines(col ="red") +
tm_shape(gps_stop_points30)+
tm_dots(col="blue", shape=30)
tmap_leaflet(map8)
Figure 6: GPS for car id 30
cc6901 <- cc %>% filter(last4ccnum==6901) %>% dplyr::select(-datetime, -date)
cc8202 <- cc %>% filter(last4ccnum==8202) %>% dplyr::select(-datetime, -date)
knitr::kable(list(cc6901,cc8202),caption="Transactions for CC 6901 & 8202")
|
|
The tagging of all 35 car owners (excluding truck drivers) have been completed and verified.
4. Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships.
To visusalise potential relationships relationships, network analysis was used to look at the relationships. Figure 7 shows an interactive network analysis of each car ID employee and the locations that they made transactions at with their GAStech cc. From the network analysis throughout the two weeks of data, we can uncover some relationships among employees.
cc_data <- cc %>% mutate(day=lubridate::day(datetime), hour=lubridate::hour(datetime))
sources <- cc_data %>% mutate(hour=lubridate::hour(datetime)) %>%
distinct(last4ccnum) %>% left_join(final_tagging, by=c("last4ccnum")) %>%
mutate(name=paste(LastName,FirstName)) %>%
rename(label = name) %>% drop_na(id) %>%
mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),"Driver",CurrentEmploymentType))
destinations <- cc_data %>%
distinct(location) %>%
rename(label = location)
cc_nodes <- full_join(sources,
destinations,
by = "label") %>% rename(car_id=id)
cc_nodes <- cc_nodes %>%
rowid_to_column("id") %>%
mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
"Locations",CurrentEmploymentType),
title=label) %>%
rename(group=CurrentEmploymentType)
edges <- cc_data %>%
mutate(last4ccnum = as.character(last4ccnum)) %>%
filter(last4ccnum %in% final_tagging$last4ccnum) %>%
group_by(last4ccnum, location, day, hour) %>%
summarise(weight = n()) %>%
ungroup()
cc_edges <- edges %>%
inner_join(cc_nodes,by = c("last4ccnum")) %>%
rename(from = id)
cc_edges <- cc_edges %>%
inner_join(cc_nodes,by = c("location" = "label")) %>%
rename(to = id) %>%
dplyr::select(from, to,day, hour, weight) %>%
mutate(time_bin = case_when(hour>=0&hour<6~"Midnight",
hour>=6&hour<12~"Morning",
hour>=12&hour<18~"Afternoon",
hour>=18~"Night"),
weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
day.week = case_when(day==6|day==13~"Monday",
day==7|day==14~"Tuesday",
day==8|day==15~"Wednesday",
day==9|day==16~"Thursday",
day==10|day==17~"Friday",
day==11|day==18~"Saturday",
day==12|day==19~"Sunday",))
visNetwork(cc_nodes, cc_edges, main="Network analysis by location and employee") %>%
visIgraphLayout(layout = "layout_on_grid") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visLegend()
Figure 7: Network analysis by location and employee
knitr::kable(cc %>% mutate(last4ccnum=as.character(last4ccnum)) %>%
left_join(final_tagging, by=c("last4ccnum")) %>%
filter(location=="Chostus Hotel") %>%
select(name, CurrentEmploymentType, CurrentEmploymentTitle,
location, timestamp, price),
caption="Table of transaction at Chostus Hotel")
| name | CurrentEmploymentType | CurrentEmploymentTitle | location | timestamp | price |
|---|---|---|---|---|---|
| Orilla Elsa | Engineering | Drill Technician | Chostus Hotel | 01/08/2014 12:56 | 107.51 |
| Tempestad Brand | Engineering | Drill Technician | Chostus Hotel | 01/08/2014 13:19 | 111.89 |
| Tempestad Brand | Engineering | Drill Technician | Chostus Hotel | 01/10/2014 13:08 | 133.25 |
| Orilla Elsa | Engineering | Drill Technician | Chostus Hotel | 01/10/2014 13:11 | 197.41 |
| Orilla Elsa | Engineering | Drill Technician | Chostus Hotel | 01/14/2014 13:17 | 109.54 |
| Tempestad Brand | Engineering | Drill Technician | Chostus Hotel | 01/14/2014 13:21 | 113.08 |
| Tempestad Brand | Engineering | Drill Technician | Chostus Hotel | 01/17/2014 13:49 | 114.22 |
| Orilla Elsa | Engineering | Drill Technician | Chostus Hotel | 01/17/2014 13:54 | 159.62 |
| NA | NA | NA | Chostus Hotel | 01/18/2014 12:03 | 600.00 |
bean_cust <- final_tagging %>% filter(name == "Frente Birgitta"|
name == "Calzas Axel"|
name == "Frente Vira"|
name == "Azada Lars"|
name == "Balas Felix"|
name == "Dedos Lidelse"|
name == "Cazar Gustav")
gps_stop_points_bean <- gps_pts %>%
filter(id %in% bean_cust$id) %>%
mutate(time.stop = difftime(next.start.time, end.time,units=c("mins")),
time.stop = as.numeric(time.stop),
name=paste(LastName,FirstName))%>%
filter(time.stop < 300 ) %>%
dplyr::select(id, start.time, start.gps,name) %>%
mutate(id=as.character(id))
## Plot interactive map
tmap_mode("view")
map_bean<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_stop_points_bean)+
tm_dots(col="name",palette="Dark2",id="start.time")
tmap_leaflet(map_bean)
Figure 8: Stationary GPS points of Bean There Done That customers
To investigate non-official relationships, we will focus on after working hours transactions. The network analysis was drilled down to transactions performed on Weekday Nights only and dining locations that had transactions in the afternoon or night to reduce cluttering of the network analysis. Figure 9 shows the network analysis for weekday nights transactions only. The edge line connecting the employees to location are colored by day to visualize if any group of employees visited a particular location on the same day in the night.
sources <- cc_data %>% mutate(hour=lubridate::hour(datetime)) %>%
distinct(last4ccnum) %>% left_join(final_tagging, by=c("last4ccnum")) %>%
mutate(name=paste(LastName,FirstName)) %>%
rename(label = name) %>% drop_na(id) %>%
mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
"Driver",CurrentEmploymentType))
destinations <- cc_data %>%
filter(location =="Ouzeri Elian"|
location=="Guy's Gyros"|
location=="Katerina's Cafe"|
location=="Hippokampos"|
location=="Abila Zacharo"|
location=="Gelatogalore"|
location=="Kalami Kafenion"|
location=="Chostus Hotel") %>%
distinct(location) %>%
rename(label = location)
cc_nodes <- full_join(sources,
destinations,
by = "label") %>% rename(car_id=id)
cc_nodes <- cc_nodes %>%
rowid_to_column("id") %>%
mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
"Locations",CurrentEmploymentType),
title=label) %>%
rename(group=CurrentEmploymentType)
edges <- cc_data %>%
mutate(last4ccnum = as.character(last4ccnum)) %>%
filter(last4ccnum %in% final_tagging$last4ccnum) %>%
group_by(last4ccnum, location, day, hour) %>%
summarise(weight = n()) %>%
ungroup()
cc_edges <- edges %>%
inner_join(cc_nodes,by = c("last4ccnum")) %>%
rename(from = id)
cc_edges <- cc_edges %>%
inner_join(cc_nodes,by = c("location" = "label")) %>%
rename(to = id) %>%
dplyr::select(from, to,day, hour, weight) %>%
mutate(time_bin = case_when(hour>=0&hour<6~"Midnight",
hour>=6&hour<12~"Morning",
hour>=12&hour<18~"Afternoon",
hour>=18~"Night"),
weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
day.week = case_when(day==6|day==13~"Monday",
day==7|day==14~"Tuesday",
day==8|day==15~"Wednesday",
day==9|day==16~"Thursday",
day==10|day==17~"Friday",
day==11|day==18~"Saturday",
day==12|day==19~"Sunday",))
cc_edges_dn<- cc_edges %>%
filter(time_bin=="Night", weekday.weekend=="Weekday") %>%
mutate(color=rainbow(max(day))[day])
# cc_edges_dn$color <- palette(rainbow(7))[cc_edges_dn$day]
visNetwork(cc_nodes, cc_edges_dn,
main="Network analysis by location and employee") %>%
visIgraphLayout(layout = "layout_on_grid") %>%
visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
visEdges(smooth=FALSE, color="color") %>%
visLegend()
Figure 9: Network analysis on Weekday Night
final_cc <- cc %>% mutate(left4ccnum=as.character(last4ccnum)) %>%
left_join(final_tagging, by="last4ccnum") %>%
mutate(day=lubridate::day(datetime), hour=lubridate::hour(datetime),
time_bin = case_when(hour>=0&hour<6~"Midnight",
hour>=6&hour<12~"Morning",
hour>=12&hour<18~"Afternoon",
hour>=18~"Night"),
weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
day.week = case_when(day==6|day==13~"Monday",
day==7|day==14~"Tuesday",
day==8|day==15~"Wednesday",
day==9|day==16~"Thursday",
day==10|day==17~"Friday",
day==11|day==18~"Saturday",
day==12|day==19~"Sunday",))
knitr::kable(final_cc %>%
filter(weekday.weekend=="Weekday"&time_bin=="Night") %>%
filter(location =="Ouzeri Elian"&(name=="Baza Isak"|name=="Calixto Nils")) %>%
select(location, datetime, name, price, CurrentEmploymentType,CurrentEmploymentTitle)
, "simple",
caption="Baza Isak and Calixto Nils transactions at Ouzeri Elian on Weekdays Nights")
| location | datetime | name | price | CurrentEmploymentType | CurrentEmploymentTitle |
|---|---|---|---|---|---|
| Ouzeri Elian | 2014-01-08 21:16:00 | Calixto Nils | 30.81 | Information Technology | IT Helpdesk |
| Ouzeri Elian | 2014-01-08 21:17:00 | Baza Isak | 29.85 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-09 19:42:00 | Baza Isak | 27.08 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-10 18:52:00 | Baza Isak | 19.92 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-13 19:30:00 | Calixto Nils | 28.75 | Information Technology | IT Helpdesk |
| Ouzeri Elian | 2014-01-14 20:32:00 | Baza Isak | 11.86 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-15 20:29:00 | Baza Isak | 23.18 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-16 20:25:00 | Baza Isak | 23.89 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-16 20:28:00 | Calixto Nils | 9.91 | Information Technology | IT Helpdesk |
| Ouzeri Elian | 2014-01-17 19:40:00 | Baza Isak | 38.60 | Information Technology | IT Technician |
| Ouzeri Elian | 2014-01-17 20:28:00 | Calixto Nils | 35.81 | Information Technology | IT Helpdesk |
Apart from the transactional data performed by employees, we will look into the GPS data to observe for any gathering and potential relationships. Figure 10 shows every employee car GPS stationary coordinates.
gps_stop_points <- gps_pts %>%
mutate(time.stop = difftime(next.start.time, end.time,units=c("mins")),
time.stop = as.numeric(time.stop),
time.location = difftime(next.start.time,end.time),
time.location = as.numeric(time.location),
name=paste(LastName,FirstName),
id=as.character(id),id=as.numeric(id),
gps.coord=end.gps)%>%
filter(id<100 ) %>%
dplyr::select(name, CurrentEmploymentType,CurrentEmploymentTitle,
end.time, end.gps,next.start.time,time.location) %>%
rename(Arrival.Time=end.time, Coordinate=end.gps,
Next_move_off_time=next.start.time,Time_at_location=time.location)
## Plot interactive map
tmap_mode("view")
fullmap<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_stop_points)+
tm_markers()
tmap_leaflet(fullmap)
Figure 10: Stationary GPS points of all cars
knitr::kable(gps_stop_points %>% st_drop_geometry() %>%
filter(name=="Dedos Lidelse", Time_at_location>450) %>%
select(name, Arrival.Time, Coordinate,
Next_move_off_time, Time_at_location),
"simple",
caption="Table of transaction at Chostus Hotel")
| name | Arrival.Time | Coordinate | Next_move_off_time | Time_at_location |
|---|---|---|---|---|
| Dedos Lidelse | 2014-01-06 08:13:09 | POINT (24.87957 36.04803) | 2014-01-06 12:24:01 | 15052 |
| Dedos Lidelse | 2014-01-06 12:27:23 | POINT (24.89995 36.05454) | 2014-01-06 13:58:01 | 5438 |
| Dedos Lidelse | 2014-01-06 14:01:23 | POINT (24.87957 36.04803) | 2014-01-06 17:32:01 | 12638 |
| Dedos Lidelse | 2014-01-06 17:37:39 | POINT (24.89608 36.06333) | 2014-01-06 19:01:01 | 5002 |
| Dedos Lidelse | 2014-01-06 19:04:05 | POINT (24.89993 36.05448) | 2014-01-06 20:13:01 | 4136 |
| Dedos Lidelse | 2014-01-06 20:18:01 | POINT (24.89612 36.06343) | 2014-01-07 07:01:01 | 38580 |
| Dedos Lidelse | 2014-01-07 07:04:46 | POINT (24.88594 36.0637) | 2014-01-07 07:43:01 | 2295 |
| Dedos Lidelse | 2014-01-07 07:50:09 | POINT (24.87957 36.04803) | 2014-01-07 12:12:01 | 15712 |
| Dedos Lidelse | 2014-01-07 12:18:16 | POINT (24.85805 36.05974) | 2014-01-07 14:00:01 | 6105 |
| Dedos Lidelse | 2014-01-07 14:06:16 | POINT (24.87957 36.04803) | 2014-01-07 17:40:01 | 12825 |
| Dedos Lidelse | 2014-01-07 17:45:39 | POINT (24.89613 36.06337) | 2014-01-07 18:47:01 | 3682 |
| Dedos Lidelse | 2014-01-07 18:50:05 | POINT (24.89997 36.0545) | 2014-01-07 20:17:01 | 5216 |
| Dedos Lidelse | 2014-01-07 20:20:05 | POINT (24.89616 36.06332) | 2014-01-08 07:12:01 | 39116 |
| Dedos Lidelse | 2014-01-08 07:15:46 | POINT (24.88589 36.06366) | 2014-01-08 07:48:01 | 1935 |
| Dedos Lidelse | 2014-01-08 07:55:09 | POINT (24.87957 36.04803) | 2014-01-08 12:25:01 | 16192 |
| Dedos Lidelse | 2014-01-08 12:28:42 | POINT (24.90247 36.05584) | 2014-01-08 13:58:01 | 5359 |
| Dedos Lidelse | 2014-01-08 14:01:42 | POINT (24.87957 36.04803) | 2014-01-08 17:43:01 | 13279 |
| Dedos Lidelse | 2014-01-08 17:48:39 | POINT (24.89616 36.06341) | 2014-01-08 19:47:01 | 7102 |
| Dedos Lidelse | 2014-01-08 19:52:01 | POINT (24.89989 36.05452) | 2014-01-08 21:17:01 | 5100 |
| Dedos Lidelse | 2014-01-08 21:22:01 | POINT (24.89617 36.0634) | 2014-01-09 07:17:01 | 35700 |
| Dedos Lidelse | 2014-01-09 07:20:46 | POINT (24.88587 36.06365) | 2014-01-09 07:59:01 | 2295 |
| Dedos Lidelse | 2014-01-09 08:06:09 | POINT (24.87957 36.04803) | 2014-01-09 12:00:01 | 14032 |
| Dedos Lidelse | 2014-01-09 12:08:17 | POINT (24.85103 36.06349) | 2014-01-09 13:50:01 | 6104 |
| Dedos Lidelse | 2014-01-09 13:58:17 | POINT (24.87957 36.04802) | 2014-01-09 17:34:01 | 12944 |
| Dedos Lidelse | 2014-01-09 17:39:39 | POINT (24.89615 36.06342) | 2014-01-10 07:08:01 | 48502 |
| Dedos Lidelse | 2014-01-10 07:11:46 | POINT (24.8859 36.06365) | 2014-01-10 07:44:01 | 1935 |
| Dedos Lidelse | 2014-01-10 07:51:09 | POINT (24.87958 36.04802) | 2014-01-10 12:19:01 | 16072 |
| Dedos Lidelse | 2014-01-10 12:27:17 | POINT (24.851 36.06342) | 2014-01-10 14:08:01 | 6044 |
| Dedos Lidelse | 2014-01-10 14:16:17 | POINT (24.87957 36.04802) | 2014-01-10 17:49:01 | 12764 |
| Dedos Lidelse | 2014-01-10 17:54:39 | POINT (24.89615 36.06334) | 2014-01-10 18:50:01 | 3322 |
| Dedos Lidelse | 2014-01-10 18:59:30 | POINT (24.86038 36.08545) | 2014-01-10 23:30:01 | 16231 |
| Dedos Lidelse | 2014-01-10 23:39:30 | POINT (24.89611 36.0634) | 2014-01-11 18:51:01 | 69091 |
| Dedos Lidelse | 2014-01-11 18:54:05 | POINT (24.8999 36.05446) | 2014-01-11 20:54:01 | 7196 |
| Dedos Lidelse | 2014-01-11 20:57:00 | POINT (24.89635 36.06331) | 2014-01-12 12:30:01 | 55981 |
| Dedos Lidelse | 2014-01-12 12:38:28 | POINT (24.85762 36.07668) | 2014-01-12 14:07:01 | 5313 |
| Dedos Lidelse | 2014-01-12 14:15:28 | POINT (24.89613 36.06335) | 2014-01-12 18:24:01 | 14913 |
| Dedos Lidelse | 2014-01-12 18:27:05 | POINT (24.89991 36.05447) | 2014-01-12 20:53:01 | 8756 |
| Dedos Lidelse | 2014-01-12 20:56:03 | POINT (24.89614 36.06337) | 2014-01-13 07:05:01 | 36538 |
| Dedos Lidelse | 2014-01-13 07:23:06 | POINT (24.85092 36.08183) | 2014-01-13 07:52:01 | 1735 |
| Dedos Lidelse | 2014-01-13 08:12:11 | POINT (24.87958 36.04802) | 2014-01-13 12:25:01 | 15170 |
| Dedos Lidelse | 2014-01-13 12:28:23 | POINT (24.89994 36.05446) | 2014-01-13 13:43:01 | 4478 |
| Dedos Lidelse | 2014-01-13 13:46:23 | POINT (24.87957 36.04802) | 2014-01-13 17:48:01 | 14498 |
| Dedos Lidelse | 2014-01-13 17:53:39 | POINT (24.89612 36.06332) | 2014-01-13 19:00:01 | 3982 |
| Dedos Lidelse | 2014-01-13 19:05:01 | POINT (24.89998 36.05449) | 2014-01-13 21:04:01 | 7140 |
| Dedos Lidelse | 2014-01-13 21:09:01 | POINT (24.89612 36.06339) | 2014-01-14 07:43:01 | 38040 |
| Dedos Lidelse | 2014-01-14 07:46:46 | POINT (24.88589 36.06364) | 2014-01-14 07:57:01 | 615 |
| Dedos Lidelse | 2014-01-14 08:04:09 | POINT (24.87957 36.04803) | 2014-01-14 12:05:01 | 14452 |
| Dedos Lidelse | 2014-01-14 12:08:42 | POINT (24.90256 36.05573) | 2014-01-14 13:50:01 | 6079 |
| Dedos Lidelse | 2014-01-14 13:53:42 | POINT (24.87958 36.04802) | 2014-01-14 17:41:01 | 13639 |
| Dedos Lidelse | 2014-01-14 17:46:39 | POINT (24.89607 36.06341) | 2014-01-14 19:14:01 | 5242 |
| Dedos Lidelse | 2014-01-14 19:17:05 | POINT (24.89994 36.05452) | 2014-01-14 20:30:01 | 4376 |
| Dedos Lidelse | 2014-01-14 20:36:01 | POINT (24.8961 36.06338) | 2014-01-15 07:41:01 | 39900 |
| Dedos Lidelse | 2014-01-15 07:50:52 | POINT (24.87958 36.04803) | 2014-01-15 12:03:01 | 15129 |
| Dedos Lidelse | 2014-01-15 12:11:26 | POINT (24.85237 36.06582) | 2014-01-15 13:42:01 | 5435 |
| Dedos Lidelse | 2014-01-15 13:50:26 | POINT (24.87957 36.04803) | 2014-01-15 17:48:01 | 14255 |
| Dedos Lidelse | 2014-01-15 17:53:39 | POINT (24.89614 36.06337) | 2014-01-15 18:49:01 | 3322 |
| Dedos Lidelse | 2014-01-15 18:52:08 | POINT (24.90177 36.05501) | 2014-01-15 20:37:01 | 6293 |
| Dedos Lidelse | 2014-01-15 20:40:08 | POINT (24.8961 36.06341) | 2014-01-16 07:13:01 | 37973 |
| Dedos Lidelse | 2014-01-16 07:16:46 | POINT (24.88592 36.06365) | 2014-01-16 07:55:01 | 2295 |
| Dedos Lidelse | 2014-01-16 08:02:09 | POINT (24.87957 36.04803) | 2014-01-16 12:17:01 | 15292 |
| Dedos Lidelse | 2014-01-16 12:23:16 | POINT (24.85804 36.05971) | 2014-01-16 13:51:01 | 5265 |
| Dedos Lidelse | 2014-01-16 13:57:16 | POINT (24.87957 36.04803) | 2014-01-16 17:38:01 | 13245 |
| Dedos Lidelse | 2014-01-16 17:43:39 | POINT (24.89612 36.06333) | 2014-01-16 19:01:01 | 4642 |
| Dedos Lidelse | 2014-01-16 19:04:08 | POINT (24.9018 36.05496) | 2014-01-16 19:46:01 | 2513 |
| Dedos Lidelse | 2014-01-16 19:49:08 | POINT (24.8961 36.06338) | 2014-01-17 07:28:01 | 41933 |
| Dedos Lidelse | 2014-01-17 07:31:46 | POINT (24.8859 36.06365) | 2014-01-17 08:00:01 | 1695 |
| Dedos Lidelse | 2014-01-17 08:07:09 | POINT (24.87957 36.04802) | 2014-01-17 11:56:01 | 13732 |
| Dedos Lidelse | 2014-01-17 12:04:26 | POINT (24.85237 36.06584) | 2014-01-17 13:59:01 | 6875 |
| Dedos Lidelse | 2014-01-17 14:07:26 | POINT (24.87957 36.04803) | 2014-01-17 17:40:01 | 12755 |
| Dedos Lidelse | 2014-01-17 17:45:39 | POINT (24.89608 36.06337) | 2014-01-18 12:38:01 | 67942 |
| Dedos Lidelse | 2014-01-18 12:40:56 | POINT (24.90247 36.05582) | 2014-01-18 13:31:01 | 3005 |
| Dedos Lidelse | 2014-01-18 13:31:14 | POINT (24.90183 36.05503) | 2014-01-18 15:36:01 | 7487 |
| Dedos Lidelse | 2014-01-18 15:39:45 | POINT (24.89617 36.06341) | 2014-01-18 18:21:01 | 9676 |
| Dedos Lidelse | 2014-01-18 18:24:05 | POINT (24.89991 36.05443) | 2014-01-18 19:37:01 | 4376 |
| Dedos Lidelse | 2014-01-18 19:40:05 | POINT (24.89612 36.0634) | 2014-01-19 18:25:01 | 81896 |
| Dedos Lidelse | 2014-01-19 18:28:05 | POINT (24.8999 36.05449) | 2014-01-19 20:06:01 | 5876 |
1.1 Frente Birgitta and Osvaldo Hennie often arrive at the location around 1700 hrs and leave at 1900 hrs on weekdays only.
1.2 Frente Birgitta would often drop by the location twice a days. On those days, Frente Birgitta would arrive around 1700hrs and leave at 1900hrs, similar like above and return subsequently to the location after 2000hrs and leave the following morning.
1.3 Osvaldo Hennie only stay overnight at that location 5 times over this period.
Probable deduction is that they were having dinner together at Dedos Lidelse house. An unofficial relationship might exist between Frente Birgitta and Dedos Lidelse. Furthermore, both employees are from the engineering department which might further support the deduction.
denos_loc<-st_set_crs(st_sfc(st_point(c(24.89612,36.06343))),4326)
denos_home<-gps_stop_points %>%
mutate(dist_denos = st_distance(Coordinate, denos_loc),
dist_denos=as.numeric(dist_denos)) %>%
filter(dist_denos<50) %>%
mutate(arrival.date=lubridate::date(Arrival.Time),
arrival.hour=lubridate::hour(Arrival.Time),
departure.date=lubridate::date(Next_move_off_time),
departure.hour=lubridate::hour(Next_move_off_time),
name2=name) %>%
filter(arrival.hour>16) %>%
select(name2, name,arrival.date, departure.date,arrival.hour,departure.hour)%>%
st_drop_geometry() %>% to_lodes_form(denos_home, key="Variables",axes=2:6)
ggplot(denos_home, aes(x=Variables,stratum=stratum,alluvium=alluvium))+
geom_alluvium(aes(fill=name2),discern=FALSE)+
geom_stratum(width=1/3,alpha=.2,discern=FALSE)+
geom_label(stat="stratum",size=2,aes(label=after_stat(stratum)))+
theme(axis.text.y=element_blank(),
axis.title.x=element_blank(),
axis.ticks.y=element_blank(),
axis.text.x = element_text(size=8),
legend.position="none")+
labs(fill="Name")
Figure 11: Alluvial Diagram of time spent at Dedos Lidelse house
5. Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why.
Employees car GPS were analysed to investigate for unusual driving patterns. The GPS data was manipulated to derive the two stationary coordinates for each car trip to determine the start and end coordinates. The distance between the two coordinates was tabulated to determine the displacement between the two coordinates.
Figure 12 shows the scatter plot of the cars plotted with distance traveled against driving time with the line of best fit to show the average speed. The plots was split by time period and only showcase weekday data. The points in red are the outliers in each time period in the 1% quantile range. Although the distance is not the actual distance traveled by the car, it will be a good proxy to determine the average speed require to get from one location to another location.
gps_dist <- gps_pts %>%
mutate(distance=st_distance(start.gps, end.gps, by_element = TRUE),
distance=as.numeric(distance), driving.time=as.numeric(driving.time),
id=as.character(id), id=as.numeric(id),
car.type=ifelse(id<100,"Car","Truck"),
speed=round((distance/1000)/(driving.time/60),2), dist=round(distance,2),
time_bin = case_when(
hour(start.time)>=0 & hour(start.time)<6 ~ "Midnight",
hour(start.time)>=6 & hour(start.time)<12 ~ "Morning",
hour(start.time)>=12 & hour(start.time) <18 ~ "Afternoon",
hour(start.time)>=18 ~ "Night"),
time_bin = factor(time_bin,
levels = c("Midnight", "Morning", "Afternoon", "Night")))
speed_0.01<- gps_dist %>% group_by(id,date) %>% summarize(d=sum(dist), dt = sum(driving.time),n=n(),
avg_dist = d/n, avg_time=dt/n) %>% ungroup()
gps_dist_weekday<-gps_dist %>%
filter(!(date %in% c(dmy(11012014),dmy(12012014),dmy(18012014),dmy(19012014)))&
id<100) %>%
st_drop_geometry() %>% group_by(time_bin) %>%
mutate(q=quantile(speed, 0.01),
col=as.character(ifelse(speed<q,1,0))) %>%
ungroup()
speed<-ggplot(gps_dist_weekday, aes(y=dist, x=driving.time))+
geom_point(aes(text=paste("</br>Name:",paste(LastName,FirstName),
"</br>Distance:",dist,"metre",
"</br>Minutes:",driving.time,
"</br>Speed:",speed,"km/hr",
"</br>Date:",date,
"</br>Start time:",start.time,
"</br>End time:",end.time),
color=col)) +
scale_color_manual(values=c("black","red"))+
geom_smooth(method="lm") +
scale_x_continuous(name="Driving Time (minutes)",limits=c(0,NA))+
scale_y_continuous(name="Distance (metre)",limits=c(0,NA))+
theme(legend.position="none")+
facet_grid(~time_bin)
ggplotly(speed,tooltip="text")%>% layout(hoverlabel=list(bgcolor="white"))
Figure 12: Scatter plot of car driving time against distance travelled on weekday
knitr::kable(gps_dist_weekday %>%
filter(col==1) %>%
group_by(id,LastName,FirstName,CurrentEmploymentType) %>%
summarize(n=n()) %>%
filter(n>1) %>%
arrange(desc(n)), "simple",
caption="Table of unusual vehicle movement")
| id | LastName | FirstName | CurrentEmploymentType | n |
|---|---|---|---|---|
| 24 | Mies | Minke | Security | 5 |
| 10 | Campo-Corrente | Ada | Executive | 2 |
| 30 | Resumir | Felix | Security | 2 |
| 35 | Vasco-Pais | Willem | Executive | 2 |
Figure 13 shows the map with GPS lines of Mies Minke and the car stationary coordinates are the blue dots throughout the 14 days. The stationary GPS coordinates of the other employees car were also added as markers on the map. From the map visualisation, we observe that Mies Minke car stop at some unusual location, which were neither his house nor point of interests locations.
1.1 Mies Minke car stopped on the South East of Abila Map near the text: To Port of Abila in the tourist map on 07/01/2014 from 1113 to 1231 hours. Apart from Mies Minke, only Osvaldo Hennie, Ferro Inga, Bodrogi Loreto ever visited the location.
1.2 Mies Minke car stopped somewhere south west of Bean There Done That on 08/01/2014 from 1132 to 1209 hours. Apart from Mies Minke car, only Osvaldo Hennie, Bodrogi Loreto and Ferro Inga car ever visited the location. Bodrogi Loreto car also visited on the same day, 08/01/2014 from 1129 to 1140 hours. The other car GPS reveals that the location was visited on 09/01/2014 and 17/01/2014.
1.3 Mies Minke car stopped near Pilau Street twice, on 10/01/2014 and 16/01/2014. Apart from Mies Minke, only Bodrogi Loreto, Ferro Inga and Osvaldo Hennie car stop at that particular location and Osvaldo Hennie car stop at the location on the same day, 16/01/2014 from 1122 to 1210 hours, which overlapped with Mies Minke car.
1.4 Mies Minke car stopped in the north between Coffee Chameleon and Guy’s Gyros on 09/01/2014 and 14/01/2014. Apart from Mies Minke, only Ferro Inga, Bodrogi Loreto and Osvaldo Hennie car visited the location from 13/01/2014 to 15/01/2014.
1.5 All four locations had the same group of 4 employees car stopping at those locations. Those locations were neither point of interests nor popular locations that other employees would visit. Furthermore, all four employees belongs to Security department and meeting at such unusual locations during weekday lunch time might suggest possible suspicious activity among them.
2.1 Mies Minke car stopped once at SVP/COO Strum Orhan house on 08/01/2014 from 2306 to 09/01/2014 0330 hours. The time period of Mies Minke car at the location is highly suspicious. Furthermore, Bodrogi Loreto car arrives at 0332 hours on 09/01/2014 and left the location at 0723 hrs in the morning.
2.2. Mies Minke car stopped once at SVP/CFO Barranco Ingrid house on the 14/01/2014 from 0331 to 0747 hours. Similarly, Osvaldo Hennie car also stopped at the location earlier from 13/01/2014 from 2308 to 14/01/2014 0330 hrs.
2.3 The group of Security employees that took turn to be at either Executive houses were the same group of suspicious personnel in part 1 of our observation.
3.1 In the earlier sections, we deduce that Mies Minke (car id 24) credit card number is 4434. However, his car GPS data supports the fact that he used credit card number 9951 to perform transactions on 13/01/2014, including the high outlier transaction amount of 10,000 dollars at Frydos Autosupply n’ More.
The four employees in particular Mies Minke are highly suspicious because of their unusual car GPS movement throughout the two weeks data.
gps_path_24 <- gps_sf %>% group_by(id) %>%
summarize(m = mean(timestamp), do_union=FALSE) %>% filter(id==24)%>%
st_cast("LINESTRING")
stop_id24<-gps_stop_points %>% filter(name=="Mies Minke")
tmap_mode("view")
map24<-tm_shape(bgmap) +
tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1,
interpolate=TRUE, max.value=255) +
tm_shape(gps_path_24)+
tm_lines()+
tm_shape(gps_stop_points)+
tm_markers() +
tm_shape(stop_id24)+
tm_dots(col="blue",size=0.1)
tmap_leaflet(map24)
Figure 13: GPS data for Mies Minke
From the map in figure 13, we discovered only five cars ever visited Kronos Capitol. From table 13. we can observe that 4 out of the 5 visits occurred on 18/01/2014 and 3 cars were from the Security department with only Herrero Kanon was from the Engineering department. Furthermore, Herrero Kanon car was stationary at that location from 18/01/2014 12:47:34 till 19/01/2014 12:38:01 where it drove off. The car being stationary at Kronos Capitol overnight was quite suspicious considering that the date was near the disappearance period. A possible deduction could it that Herrero Kanon took either of the 3 other Security cars and left Kronos Capitol before returning the next day to retrieve his vehicle. Another probable deduction could be Herrero Kanon was engaged in some activities during that period inside Kronos Capitol.
capitol<-st_set_crs(st_sfc(st_point(c(24.84936527, 36.05293538))),4326)
gps_capitol <- gps_stop_points %>%
mutate(diff=st_distance(geometry,capitol),
diff=as.numeric(diff)) %>%
filter(diff<50) %>%
st_drop_geometry() %>%
select(name, CurrentEmploymentType,CurrentEmploymentTitle,Arrival.Time,
Next_move_off_time, Time_at_location)
knitr::kable(gps_capitol,"simple",captio="Car stop at Kronos Capitol")
| name | CurrentEmploymentType | CurrentEmploymentTitle | Arrival.Time | Next_move_off_time | Time_at_location |
|---|---|---|---|---|---|
| Vasco-Pais Willem | Executive | Environmental Safety Advisor | 2014-01-11 17:33:05 | 2014-01-12 12:45:01 | 69116 |
| Nubarron Adra | Security | Badging Office | 2014-01-18 13:31:38 | 2014-01-18 18:42:01 | 18623 |
| Bodrogi Loreto | Security | Site Control | 2014-01-18 15:26:01 | 2014-01-18 18:32:01 | 11160 |
| Vann Edvard | Security | Perimeter Control | 2014-01-18 18:24:54 | 2014-01-18 19:37:01 | 4327 |
| Herrero Kanon | Engineering | Geologist | 2014-01-19 12:44:08 | 2014-01-19 13:34:01 | 2993 |
In conclusion, the employees in the Security department are very suspicious based on the GPS and credit card transactions data presented. We would recommend to perform further investigation on them to determine if they were linked to the dispparance in Abila town.